fix(series): arithmetics for Series[Any] #1343

cmp0xff · 2025-08-22T12:34:30Z

This PR implements the ideas from #1274 (comment) and #1274 (comment).

Tests added: Please use assert_type() to assert the type of any return value

Dr-Irv

Only issues I see are with respect to the code inside of if TYPE_CHECKING_INVALID_USAGE that you added.

The concern here is that some of the lines will execute fine if the Series comes from a DataFrame and has the correct type inside, but we see that it is a static Series[Any]. And vice versa.

I think I prefer where if you are doing an operation like subtraction where it will sometimes work and sometimes not work, and the inferred type of one of the operands is Series[Any], then we detect that as a typing problem. But we need to be selective.

For example,

df = pd.DataFrame({"a": [1,2,3], "b": pd.to_datetime(["1/1/2025", "2/1/2025", "3/1/2025"])})
sa = df["a"]
sb = df["b"]
sa - pd.Timestamp("1/1/2024")  # fails at runtime
sb - pd.Timestamp("1/1/2024") # works at runtime

Here sa and sb are Series[Any] (mypy) or Series[Unknown] (pyright). So the typing either has to accept both cases or reject both cases.

I think we have to be selective here, and probably disallow subtraction with untyped Series when the other argument is known to be time related (Timestamp, Timedelta and associated Series) or is a string or Series[str]. I think the current stubs are more permissive, but now I'm not sure that's the right thing to do.

tests/series/arithmetic/test_sub.py

cmp0xff · 2025-08-23T08:39:43Z

Hi @Dr-Irv , thank you for drafting the plan.

Current plan

I think I prefer where if you are doing an operation like subtraction where it will sometimes work and sometimes not work, and the inferred type of one of the operands is Series[Any], then we detect that as a typing problem. But we need to be selective.

I think we have to be selective here, and probably disallow subtraction with untyped Series when the other argument is known to be time related (Timestamp, Timedelta and associated Series) or is a string or Series[str].

I would like to summarise this typing plan as following:

When the calculation can give a runtime error, typing shows an error or Never
Certain cases are exceptions

`Timestamp` and `Timedelta`: permissive or forbidding

With this typing plan, I have the following examples in my mind:

Series[Any] (int) - TimestampSeries -> error at type checking, error at runtime
Series[Any] (Timestamp) - TimestampSeries -> error at type checking, TimedeltaSeires at runtime

As a user I probably do not want the static type checker to aggressivly point out a potential problem. When the stub is less permissive and more forbidding, the static type checker becomes more aggresive. It seems better to me to allow both cases at the stage of static type checking, otherwise the user may need to manually ignore the type checker in many cases.

`int`: exceptions to the plan

"We need to be selective" is important in the plan, because we also have

Series[Any] (int) + Series[int] -> Series[Any] at type checking, Series[int] at runtime
Series[Any] (str) + Series[int] -> Series[Any] at type checking, error at runtime

Currently we are happy with the stub giving us Series[Any] for adding Series[Any] to Series[int]. This is an exception, which may potentially confuse the user.

Proposing a consistent plan

I would like to propose a new typing plan as following:

When the calculation gives several typing results or a runtime error, typing shows Series[Any]
When the calculation gives one typing result, say Series[R], or a runtime error, typing shows Series[R]
When the calculation always gives a runtime error, typing shows an error or Never

With this typing plan, the previous examples give different results:

Series[Any] (int) - TimestampSeries -> TimedeltaSeries at type checking, error at runtime (TimedeltaSeries is the only possible result that is valid, so unfortunately the type checker does not cache the potential problem here)
Series[Any] (Timestamp) - TimestampSeries -> TimedeltaSeries at type checking, TimedeltaSeires at runtime
Series[Any] (int) + Series[int] -> Series[Any] at type checking, Series[int] at runtime (no exceptional rule in the plan)
Series[Any] (str) + Series[int] -> Series[Any] at type checking, error at runtime (Series[float], Series[int] etc. are possible valid results, so unfortunately the type checker does not cache the potential problem here)

Further examples:

Series[Any] (int) + Series[str] -> Series[str] at type checking, error at runtime (Series[str] is the only possible result that is valid, so unfortunately the type checker does not cache the potential problem here)
Series[Any] (str) + Series[str] -> Series[str] at type checking, Series[str] at runtime
Series[Any] * TimestampSeries -> error / Never at type checking, error at runtime (Timestamp is consistently not multiplicative)

Thank you for reading the lengthy explanation. What do you think?

Dr-Irv · 2025-08-23T18:39:54Z

Thank you for reading the lengthy explanation. What do you think?

The challenge here is the issue of wide vs narrow types. See https://github.com/pandas-dev/pandas-stubs/blob/main/docs/philosophy.md#narrow-vs-wide-arguments for some writeup I did about that.

Let's consider this example from your list:

Series[Any] (int) - TimestampSeries -> TimedeltaSeries at type checking, error at runtime

In what is in main today, the following code works as you describe there, i.e., the type checker infers that result is TimedeltaSeries, but it fails at runtime.

si = pd.DataFrame({"a": pd.Series([1,2,3])})["a"]
st = pd.Series(pd.date_range("1/1/2005", "1/3/2005"))
result = si - st

I think we do a better service to users if we actually catch this via typing, i.e., for Timedelta, TimedeltaSeries, Timestamp, TimestampSeries, str and Series[str], if they are in a binary operation with a Series[Any] (either before the operator or after the operator), the type checker reports an error. That's telling the user "We don't know how to handle a generic series with another operand that has a specified type", but we are limiting the types we do that with to just the ones I mentioned.

This makes the user then cast the variable si above to Series[int] (in which case we catch the failure), and know it will possibly fail at runtime.

Let's also consider this example:

st = pd.Series(pd.date_range("1/1/2005", "1/3/2005"))
sd = pd.DataFrame({"a": [pd.Timedelta("1 day"), pd.Timedelta("2 days"), pd.Timedelta("3 days")]})["a"]
result = st - sd

In this case, if we adopt my proposal, the type checker would say that st - sd is invalid. But the type of sd is partially unknown, so we are then suggesting that the user do:

result = st - cast("pd.Series[Timedelta]", sd)

which is telling the type checker "I know this is a series of timedeltas"

I'm choosing what I consider to be a happy medium here between your proposal, and something that would be too narrow (e.g., disallowing Series[Any].__sub__(Series[Any])), by suggesting that if we know the types of ONE of the operands, but not the other, we try to catch the error via static typing.

I should say that the current behavior in the stubs is from 3 years ago when we first inherited the project from something MIcrosoft had started, and now that I have more experience with typing, as well as using the stubs in my own code, I've come around to trying to find more things with static type checking if we can find them, then not.

So the summary of my proposal is (with respect to Series) for binary operators a X b, where X is the operator:

If a and b are fully typed, we figure out the result, and if it is an invalid calculation, we catch it.
If a is Series[Any] and b is fully typed, we say that is an error.
If a is fully typed, and b is Series[Any], we say that is an error.
If a and b are not fully typed (i.e., one is Series[Any] and the other is Any or Series[Any], we accept the calculation in typing and don't report an error.

Let me know your thoughts on that.

cmp0xff · 2025-08-24T09:13:46Z

Hi @Dr-Irv , thank you again for the detailed reply.

`TimestampSeries`

If a and b are fully typed, we figure out the result, and if it is an invalid calculation, we catch it.

If a is Series[Any] and b is fully typed, we say that is an error.

If a is fully typed, and b is Series[Any], we say that is an error.

If a and b are not fully typed (i.e., one is Series[Any] and the other is Any or Series[Any], we accept the calculation in typing and don't report an error.

I am concerned with the compatibility of 2+3 and 4.

We can make Series[Any] - TimestampSeries not explicitly typed, so that the type checking shows an error.
However we already have Series[S1] - Series -> Series[Any]
When we remove TimestampSeries, Series[Any] - Series[Timestamp] will default to ``Series[S1] - Series, giving Series`.
Along this practice, we would need to explicitly set Series[Never] - TimestampSeries -> Never for now, and after the removal of TimestampSeries, Series[Never] - Series[Timestamp] -> Never
Alternatively, we will not be able to remove TimestampSeries.

In my proposal, I would rather allow Series[Any] - TimestampSeries as well as Series[Any] - TimedeltaSeries, leaving the user to check the potential error at runtime. We can either give Series[Any] in both cases, or, using my plan, give TimedeltaSeries in the first case.

We have already done the same to allow Series[Any] - Series[int], giving Series[Any], which could also fail at runtime if Series[Any] actually contains str.

Dr-Irv · 2025-08-24T16:21:17Z

In my proposal, I would rather allow Series[Any] - TimestampSeries as well as Series[Any] - TimedeltaSeries, leaving the user to check the potential error at runtime. We can either give Series[Any] in both cases, or, using my plan, give TimedeltaSeries in the first case.

Or we can catch it in type checking??

We have already done the same to allow Series[Any] - Series[int], giving Series[Any], which could also fail at runtime if Series[Any] actually contains str.

Yes, this latter point is quite valid. Here's a possible modification of my proposal: (with respect to Series) for binary operators a X b, where X is the operator:

If a and b are fully typed, we figure out the result, and if it is an invalid calculation, we catch it.
If a is Series[Any] and b is Timestamp, Series[Timestamp], Timedelta, Series[Timedelta], str, Series[str], we catch that as a typing error.
If a is Timestamp, Series[Timestamp], Timedelta, Series[Timedelta], str, Series[str], and b is Series[Any], we say that is an error.
If a and b are not fully typed (i.e., one is Series[Any] and the other is Any or Series[Any], we accept the calculation in typing and don't report an error.

Is this possible?

cmp0xff · 2025-08-24T20:29:25Z

I tried to implement the plan in e0b5b59. The str part proves to be a bit tricky. I would like to leave it out for now.

Dr-Irv · 2025-08-25T13:03:40Z

I tried to implement the plan in e0b5b59. The str part proves to be a bit tricky. I would like to leave it out for now.

So should I review now?

cmp0xff · 2025-08-25T13:16:28Z

So should I review now?

Yes please. I believe I have implemented the ideas in #1343 (comment). I would like to follow the proposal there.

Dr-Irv

I figured out how to handle the operators with str. See 9c972d3

I put the tests I used in one file, figuring you can spread them out accordingly.

The issue is that _ListLike includes Sequence[S1], which matches a str, so the key was to use SequenceNotStr[S1] instead.

tests/series/arithmetic/test_sub.py

cmp0xff

9c972d3 cherry picked as 5c23f68

tests/series/arithmetic/test_sub.py

Dr-Irv

Pretty close. Just issue in using pytest.raises.

I also think it would be worth adding a subsection to https://github.com/pandas-dev/pandas-stubs/blob/main/docs/philosophy.md#use-of-generic-types that explains the philosophy we've agreed to. Namely, since df["a"] has unknown type, we restrict arithmetic when the the other operand is Timedelta, Timestamp, str, etc., and their Series variants. Then we tell people they should use cast to tell the type checker "I know that this column of my DataFrame has the right type"

tests/series/arithmetic/str/test_add.py

tests/series/arithmetic/test_add.py

tests/series/arithmetic/test_sub.py

tests/test_frame.py

Dr-Irv · 2025-08-26T21:37:43Z

There's been another issue with the overloads that has caused the dev version of mypy to fail on the current main. I discovered the problem, and if you look at this set of diffs, it fixes the problem, working with both mypy 1.17.1 and the dev version.

Can you incorporate these changes into your PR?

9c972d3...Dr-Irv:pandas-stubs:d3b8f164b2dc469146d43d49fd73eabc7ebe643a

cmp0xff

philosophy.md: I composed something in 6a4692f
mypy dev: I cherry-picked d3b8f16 as c4b218e

tests/series/arithmetic/str/test_add.py

tests/series/arithmetic/test_add.py

tests/series/arithmetic/test_sub.py

tests/test_frame.py

Dr-Irv

I still would like to not use the pytest.raises pattern.

I like the docs. Thanks for that.

tests/series/arithmetic/test_add.py

tests/series/arithmetic/test_sub.py

tests/series/arithmetic/test_add.py

Dr-Irv

thanks @cmp0xff . Very nice contribution.

Question for you - I could do a release with these latest changes, because we have all the arithmetic working, and only have the work to get rid of TimestampSeries and TimedeltaSeries left to do.

So do you think I should release this now, or wait?

cmp0xff · 2025-08-27T21:39:45Z

Hi @Dr-Irv , thank you for asking me, I am honoured.

If I were in a the position to release a new version, I would do so now and save dropping TimestampSeries and TimedeltaSeries for the next release. The changes to arithmetic operations have already been significant, and it would be good to get some feedback.

Dr-Irv · 2025-08-27T23:20:51Z

Hi @Dr-Irv , thank you for asking me, I am honoured.

If I were in a the position to release a new version, I would do so now and save dropping TimestampSeries and TimedeltaSeries for the next release. The changes to arithmetic operations have already been significant, and it would be good to get some feedback.

It's done. Just released 2.3.2.250827

fix(series): arithmetics for Series[Any]

b6cdcf1

cmp0xff mentioned this pull request Aug 22, 2025

refactor: #718 only drop TimestampSeries #1274

Draft

2 tasks

Dr-Irv requested changes Aug 22, 2025

View reviewed changes

tests/series/arithmetic/test_sub.py Outdated Show resolved Hide resolved

tests/series/arithmetic/test_sub.py Outdated Show resolved Hide resolved

fix(comment): pandas-dev#1343 (comment)

1fb597b

chore(typing): update mypy and ty

961d692

fix(comment): without str pandas-dev#1343 (comment)

e0b5b59

cmp0xff changed the title ~~fix(series): arithmetics for Series[Any]~~ fix(series): arithmetics for Series[Any] (Timestamp | Timedelta) Aug 24, 2025

cmp0xff requested a review from Dr-Irv August 25, 2025 13:14

Dr-Irv requested changes Aug 25, 2025

View reviewed changes

tests/series/arithmetic/test_sub.py Outdated Show resolved Hide resolved

tests/series/arithmetic/test_sub.py Outdated Show resolved Hide resolved

Dr-Irv and others added 3 commits August 25, 2025 17:28

handle str ops

5c23f68

feat(series): implement the proposal

8d8691e

Merge branch 'main' into feature/cmp0xff/arithmetics-for-series-any

646cdcd

cmp0xff commented Aug 25, 2025

View reviewed changes

tests/series/arithmetic/test_sub.py Outdated Show resolved Hide resolved

tests/series/arithmetic/test_sub.py Outdated Show resolved Hide resolved

cmp0xff requested a review from Dr-Irv August 25, 2025 20:36

cmp0xff changed the title ~~fix(series): arithmetics for Series[Any] (Timestamp | Timedelta)~~ fix(series): arithmetics for Series[Any] Aug 25, 2025

Dr-Irv requested changes Aug 25, 2025

View reviewed changes

tests/series/arithmetic/str/test_add.py Outdated Show resolved Hide resolved

tests/series/arithmetic/test_add.py Outdated Show resolved Hide resolved

tests/series/arithmetic/test_sub.py Outdated Show resolved Hide resolved

tests/test_frame.py Outdated Show resolved Hide resolved

fix(comment): reduce with pytest.raises(AssertionError)

b4235d0

cmp0xff marked this pull request as draft August 26, 2025 15:28

cmp0xff and others added 2 commits August 27, 2025 00:06

doc(comment): pandas-dev#1343 (review) pandas-dev#1343 (comment)

6a4692f

fix for nightly numpy

c4b218e

cmp0xff commented Aug 26, 2025

View reviewed changes

tests/series/arithmetic/str/test_add.py Outdated Show resolved Hide resolved

tests/series/arithmetic/test_add.py Outdated Show resolved Hide resolved

tests/series/arithmetic/test_sub.py Outdated Show resolved Hide resolved

tests/test_frame.py Outdated Show resolved Hide resolved

cmp0xff requested a review from Dr-Irv August 26, 2025 22:23

cmp0xff marked this pull request as ready for review August 26, 2025 22:23

Dr-Irv requested changes Aug 27, 2025

View reviewed changes

tests/series/arithmetic/test_add.py Outdated Show resolved Hide resolved

tests/series/arithmetic/test_sub.py Show resolved Hide resolved

tests/series/arithmetic/test_add.py Outdated Show resolved Hide resolved

fix: Never

3e9c06e

cmp0xff requested a review from Dr-Irv August 27, 2025 18:22

chore: typo

d141a69

Dr-Irv approved these changes Aug 27, 2025

View reviewed changes

Dr-Irv merged commit 669a258 into pandas-dev:main Aug 27, 2025
13 checks passed

cmp0xff deleted the feature/cmp0xff/arithmetics-for-series-any branch August 27, 2025 21:34

Uh oh!

fix(series): arithmetics for Series[Any] #1343

fix(series): arithmetics for Series[Any] #1343

Uh oh!

Conversation

cmp0xff commented Aug 22, 2025

Uh oh!

Dr-Irv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cmp0xff commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current plan

Timestamp and Timedelta: permissive or forbidding

int: exceptions to the plan

Proposing a consistent plan

Uh oh!

Dr-Irv commented Aug 23, 2025

Uh oh!

cmp0xff commented Aug 24, 2025

TimestampSeries

Uh oh!

Dr-Irv commented Aug 24, 2025

Uh oh!

cmp0xff commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dr-Irv commented Aug 25, 2025

Uh oh!

cmp0xff commented Aug 25, 2025

Uh oh!

Dr-Irv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cmp0xff left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Dr-Irv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Dr-Irv commented Aug 26, 2025

Uh oh!

cmp0xff left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Dr-Irv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Dr-Irv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cmp0xff commented Aug 27, 2025

Uh oh!

Dr-Irv commented Aug 27, 2025

Uh oh!

Uh oh!

cmp0xff commented Aug 23, 2025 •

edited

Loading

`Timestamp` and `Timedelta`: permissive or forbidding

`int`: exceptions to the plan

`TimestampSeries`

cmp0xff commented Aug 24, 2025 •

edited

Loading